C# Speech Recognition - Is this what the user said?

Asked 22/10, 2008 at 19:4 Answered 20/7, 2016 at 0:44

I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it). I also need to be able to switch quickly between languages, without changing the language of the operating system.

The users will be using the system for very short periods. The application needs to work without the requirement of first training the recognition engine to the users' voices.

It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.

Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language. I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.

Can anyone recommend something for me?

Sanatorium answered 22/10, 2008 at 19:4 Comment(2)

Please clarify... What do you mean? Do you mean a recognition engine? An structure for the application? If you should even attemp to do it? – Stubblefield 22/10, 2008 at 19:6

I am mainly looking for an engine to use. I need to be able to tell my managers whether or not the idea is feasible. I already have a rough idea of how to structure the application around the engine, all I need is to plug the engine in. – Sanatorium 22/10, 2008 at 19:15

A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();
    for (var i = 0; i <= 100; i++)
      c.Add(i.ToString());
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.

System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)

It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).

As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(

As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")

Paquette answered 22/10, 2008 at 19:31 Comment(6)

How would I adapt this code to recognise 1 - 100 in French or German without needing to change the OS display language? – Sanatorium 23/10, 2008 at 7:7

The only language possible is the one of your OS. I just read it from MSDN. – Scend 17/3, 2009 at 13:18

I think your comment that "only works with a pre-defined list of words or phrases" is not true. The desktop recognizer in Vista and later includes a dictation grammar that you can load. See msdn.microsoft.com/en-us/library/… – Francophile 18/10, 2010 at 19:59

+1 for horrible sales from Nuance / Dragon. We had to practically beg to get them to sell us their software (for a very non-trivial price). – Chrisoula 14/1, 2011 at 22:27

Hey... I tried to use this code... In my form I have only the label (is it correct?), I run the application, the speech recognizer start working, I say for example "four", and in the speech recognizer program always appears "what was that?" and nothing happens to the label. I even put a break point in Form1_Load and it never gets there... any suggestions? thanks :) – Dillondillow 29/4, 2012 at 18:50

@user990635, make sure your microphone is working. Try recording in some other program such as the Windows Sound Recorder. Also, make sure your microphone is a decent one. Some microphones' quality isn't good enough for speech to be understood by a computer. – Paquette 30/4, 2012 at 4:33

[Note: I was the development lead for the managed speech recognition API in .NET 3.0]

System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.

If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).

Comment if you need to know more.

Philipp

Clardy answered 4/11, 2008 at 21:35 Comment(5)

Phillip, if I want to use the engine and train it to learn and recognize spoken Croatian, as a way for transcribing various speakers, is it possible, and if is, where to start? – Battik 23/10, 2010 at 21:43

There are 2 parts to a speech recognizer: acoustic models and language model. You can use the Vista Dictation Resource Kit (or something like that) to build a dictation language model that references Croatian words. There are currently no tools to train the acoustic models which you would want to do if there are sounds in Croatian that are not present in English (or whatever existing SR language you are using). You can specify custom pronunciations for your Croatian words to improve your recognition accuracy. – Clardy 29/10, 2010 at 16:17

Phillip, I have been playing with the libraries you suggested, but I have had very hard time getting it to recognize what I said. I do have a little accent, but I have not had this experience in other devices, like kinect and so on. Should I just consider that my mic is not good enough, or am I making something wrong? So you have any suggestions? Ps. I am using the example on the MSDN documentation. – Liggitt 3/5, 2013 at 1:0

@Liggitt Are you using grammar-based recognition? That's what Xbox with Kinect is using. It will be much more accurate (but also more restrictive) than dictation. – Clardy 3/5, 2013 at 16:46

@Philipp this is the example I used msdn.microsoft.com/en-us/library/vstudio/…, I don't think its grammar based. What I meant by using kinect, is that my accent did not matter when I was using kinect, on my XBox, not that I was programming against it. – Liggitt 3/5, 2013 at 21:30

Before this add 'Speech' reference

System.Speech

Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.

I have marked these modifications with some commenting added to the previous example.

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();

    // Doens't work must use English words to add to Choices and
    // populate grammar.
    //
    //for (var i = 0; i <= 100; i++)
    //  c.Add(i.ToString());

    c.Add("one");
    c.Add("two");
    c.Add("three");
    c.Add("four");
    // etc...

    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

Swop answered 25/11, 2008 at 5:32 Comment(2)

Not sure why it didn't work for you; the code I posted came directly from a program I wrote and used, and it worked for me. Perhaps it's related to the culture settings on your system? – Paquette 19/11, 2009 at 14:25

Could be. I didn't look into this extremely in depth. – Swop 8/12, 2009 at 20:9

If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):

Lumenvox engine

you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:

The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).

Columbus answered 22/10, 2008 at 19:14 Comment(1)

The Lumenvox engine looks like it might do the trick! I'm going to have to play a bit with it to be certain. Also need to discuss pricing with the managers. Thanks Jorge! – Sanatorium 22/10, 2008 at 19:20

Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.

That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.

That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/

Isabeau answered 22/10, 2008 at 19:23 Comment(1)

+1 for the warning about not learning, am just wishing there was a .NET port for Sphinx-4. – Fitzpatrick 22/7, 2011 at 11:57

Check out the new Speech class libraries in .NET 3.5

http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.aspx

general documentation for SR and TTS

http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspx http://msdn.microsoft.com/en-us/library/system.speech.synthesis.aspx

Goodson answered 22/10, 2008 at 19:21 Comment(0)

Dragon Naturally Speaking SDK might be worth looking at. This project looked interesting.

Haven't got to play with either of them though.

Comprehension answered 22/10, 2008 at 19:17 Comment(2)

Dead link and the "This Project" is about text to speech, not speech recognition – Ascension 11/4, 2012 at 19:19

@Ascension Did you read the latter part of the question before you down-voted me? He was looking for both a speech recognition piece and text-to-speech. No surprise here that a 3+ year old link is dead. I updated it. – Comprehension 11/4, 2012 at 19:45

Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.

Rachellrachelle answered 22/10, 2008 at 19:22 Comment(0)

Try Microsoft Speech Server, which I think now is part of Office Communication Server 2007. It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.

Mutinous answered 4/11, 2008 at 21:41 Comment(0)

This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista. Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx

Francophile answered 9/6, 2010 at 18:29 Comment(0)

Well, this question already has many good responses but I think it is valuable to update with some info from 2016 documentation the responses from Rob Segal and Philipp Schmid pointing to this nice code example:

https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx

It did not use the shared recognizer of Windows (The little Windows Mic that shows out up in the middle of the screen), it use a nice in app SpeechRecognitionEngine that not need any visual cue. The UI is completly at your control.

Janiecejanifer answered 20/7, 2016 at 0:44 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags